Selective Replicated Declustering for Arbitrary Queries
نویسندگان
چکیده
Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the system. Many replicated declustering schemes have been proposed. Most of these schemes generate two or more copies of all data items. However, some applications have very large data sizes and even having two copies of all data items may not be feasible. In such systems selective replication is a necessity. Furthermore, existing replication schemes are not designed to utilize query distribution information if such information is available. In this study we propose a replicated declustering scheme that decides both on the data items to be replicated and the assignment of all data items to disks when there is limited replication capacity. We make use of available query information in order to decide replication and partitioning of the data and try to optimize aggregate parallel response time. We propose and implement a Fiduccia-Mattheyses-like iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multi-way replicated declustering. Experiments conducted with arbitrary queries on real datasets show that, especially for low replication constraints, the proposed scheme yields better performance results compared to existing replicated declustering schemes.
منابع مشابه
Independent Study Report: Propose Partial Replication Schemes for Replicated Declustering and Compare Their Performance
In this course, under the supervision of our professor, we have focused on the techniques for declustering data into multiple disks. Our aim is to get familiar with all the related work done till today so we have started from the very beginning of declustering which is done on relational databases and Cartesian product files [4, 5]. We have explored the literature and find out the most outstand...
متن کاملThreshold-based declustering
Declustering techniques reduce query response time through parallel I/O by distributing data among multiple devices. Except for a few cases it is not possible to find declustering schemes that are optimal for all spatial range queries. As a result of this, most of the research on declustering has focused on finding schemes with low worst case additive error. However, additive error based scheme...
متن کاملTopic 5: Parallel and Distributed Databases
Euro-Par Topic 5 addresses data management issues in parallel and distributed computing. Advances in data management (storage, access, querying, retrieval, mining) are inherent to current and future information systems. Today, accessing large volumes of information is a reality: Data-intensive applications enable huge user communities to transparently access multiple pre-existing autonomous, di...
متن کاملLatin Hypercubes: A Class of Multidimensional Declustering Techniques
The I/O subsystem is widely accepted as one of the principal bottlenecks for high performance parallel databases systems. The emergence of parallel I/O architectures has made the problem of data declustering, i.e. fragmenting a le of records and allocating the pieces to different disks, one of prime importance. This is evident from the growing activity in this area. In this study we focus only ...
متن کاملPerformance Evaluation of Grid Based Multi-Attibute Record Declustering Methods
I/O subsystem is widely accepted as one of the principal bottlenecks for high performance parallel databases systems. The emergence of parallel I/O architectures has made the problem of data declustering, i.e. fragmenting a le of records and allocating the pieces to diierent disks, one of prime importance. This is evident from the growing activity in this area. In this study we focus only on mu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009